Nucleic acid molecules coding for a dextran-saccharase catalysing the synthesis of dextran with α 1,2 osidic sidechains

ABSTRACT

The invention relates to an isolated polypeptide with an glycosyl transferase enzymatic activity for producing dextrans with α(1→2) sidechains, comprising at least one region for bonding to glucan and a catalytically active region situated beyond the region bonding to glucan. The invention further relates to polynucleotides coding for said enzymes and vectors containing the same.

The present invention relates to the field of glycotechnology, moreparticularly to the synthesis of oligosaccharides or oligosides with aprebiotic, therapeutic or diagnostic effect.

The present invention pertains to nucleic acid molecules encoding anenzyme having a glycosyltransferase activity catalyzing the synthesis ofdextrans or oligosides carrying α(1→2) osidic type linkages.

The invention also pertains to enzymes synthesized by the nucleic acidsof the invention, and to their expression systems in prokaryotic oreukaryotic cells. Finally, they pertain to the use of said enzymes inthe production of oligosaccharides in foodstuffs, or as an activeprinciple in therapeutic and/or cosmetic products.

Oligosides and heterooligosides act as recognition and effector signalsin both animals and plants (as oligosaccharines) by specifically bindingto lectins, glycosyltransferases, glycosidases, adhesion molecules etc.The antigenic determinants of blood groups are osidics and our defenseagainst many pathogenic bacteria is directed against osidic structuresof the bacterial envelope. Further, one of the major reasons forxenograft rejection is the existence of osidic structures belonging toeach species. Such properties, and the knowledge acquired in recentyears regarding glycosyltransferases and lectins, contribute to makingcertain oligosides the candidates of choice for therapeutic orprophylactic treatment of disorders linked to the microbiologicalequilibrium of various organs such as the intestine or skin. As anexample, oligosides constitute an interesting alternative to the use ofmicro-organisms and antibiotics in regulating the composition ofintestinal flora (prebiotic effect). Certain oligosides can beconsidered to be “soluble fiber” when they are not metabolized by humanand animal digestive enzymes; on reaching the colon, they interact withthe microbial flora and specifically affect the growth and adhesion ofcertain species. If they are incorporated into food in low doses (lessthan 1%), certain osidic molecules improve health and stimulate weightgain in animals.

A review of different glycosyltransferases, their structure and theiractivity, has been carried out by Vincent Monchois et al (1). Briefly:

a) it appears that the structure of the glycosyltransferases and/ordextransucrases studied is highly conserved and is constituted, startingfrom the amino part of the protein, by a signal sequence, a variabledomain, a catalytic domain and a glucan binding domain.

b) glucooligosides (GOS) can be synthesized by glycosyltransferases suchas dextransucrases from cheaper substrates such as saccharose and in thepresence of a glucose accepting sugar. Other substrates such asα-D-fluoroglucose, para-nitrophenyl-α-D-glucopyranoside,α-D-glucopyranoside-α-D-sorbofuranoside or4-O-α-D-galactopyranosylsucrose can also be used.

Starting from the substrate, such enzymes catalyze the transfer ofglucose units onto acceptor molecules. In the presence of a glucoseacceptor such as maltose or isomaltose, glycosyltransferases catalyzethe synthesis of low molecular weight oligosaccharides primarilycomprising chains with 3 to 7 glucoses, in accordance with the reaction:Saccharose+acceptor→glucosylated acceptor+fructose

In such cases, enzymes generally have a specificity for the synthesis ofosidic bonds in accordance with that forming the donor polymer.

In contrast, in the absence of an acceptor, the enzyme synthesizes highmolecular weight dextran type glucans by successive transfer ofα-D-glucopyranosyl units from saccharose in accordance with thereaction:n-saccharose→(glucose)_(n)+n-fructose

c) The structures and function of glucans or oligosides synthesized byglycosyltransferases depends on the producing bacterial strain.

Throughout the present text, the generic term “glycosyltransferases” isused to designate the different enzymes capable of catalyzing thesynthesis of glucose polymers from saccharose. They are generallyproduced by bacterial strains of the Leuconostoc, Lactococcus,Streptococcus or Neisseria type. The size and structure of the glucansproduced depends on the producing strain.

The glucose units are coupled by α(1→6) osidic bonds as in dextran, byα(1→3) bonds as in the case of mutane, or by alternations of the twotypes (alternane).

Similarly, the existence and nature of the linkages, their length andposition varies depending on the origin of the producing strain.

Glycosyltransferases producing glucans or GOSs containing at least 50%α(1→6) bonds are termed dextransucrases. GOSs synthesized by saidenzymes may carry α(1→2), α(1→3) and/or α(1→4) linkages. Saiddextransucrases are produced by Leuconostoc mesenteroides type bacteria.

-   -   d) The dextransucrase from L. mesenteroides NRRL B-1299 can        produce a highly branched dextran the majority of linkages of        which are of the α(1→2) type. Used in the presence of saccharose        and maltose, a glucose acceptor molecule, it results in the        formation of GOS some of which have a α(1→2) bond at their        non-reducing end and others of which have α(1→2) linkages on        intermediate residues between the ends. For this reason, they        resist degradation by enzymes (hydrolases) of the upper        digestive tract in man and animals, and are only degraded by        bacterial genuses that are capable of fermenting, such as        Bacteroides and Bifidobacterium, considered to be beneficial to        the host organism.

An identical phenomenon occurs in the skin, allowing cosmeticapplications to be envisaged, since a lack of equilibrium of thecutaneous microbial flora is the root of numerous cosmetic anddermatological problems. For these reasons, they are designated “GOS ofinterest” in the present text.

Throughout the text, polysaccharides synthesized by theglycosyltransferases of the invention are either high molecular weightdextrans when the reaction is carried out without a glucose acceptor, oroligosides when the reaction is carried out in the presence of a glucoseacceptor such as maltose or isomaltose without this necessarily beingspecified. The functionality of the enzyme is characterized by thenature of the glucose-glucose bonds, [α(1→6), α(1→2)] or others, and notby the molecular weight of the polysaccharide that is synthesized.

dextransucrases from L. mesenteroides already have a number ofapplications in industry, and in particular those from the NRRL B-1299strain for which a method for synthesizing GOSs having α(1→2) linkageshas been described in European patent EP-B1-0 325 872.

Marguerite Dols et al (2) showed that the GOS produced dextransucrasesfrom that strain are in fact a mixture of at least three similarfamilies of molecules differing by the number and position of the α(1→2)type linkages, which leads to the hypothesis that differentglycosyltransferase type enzymatic activities exist in that bacterialstrain.

Because of the industrial interest pertaining to GOSs with α(1→2)linkages as summarized above in the field of prebiotic foodstuffs, incosmetics or in pharmaceuticals, the present invention aims to isolateand characterize a particular enzyme from among those produced by Lmesenteroides NRRL B-1299 which more particularly would be involved inthe synthesis of oligosides having α(1→2) linkages. The identificationand characterization of such an enzyme have the advantage firstly ofproviding a uniform, reproducible method for producing GOSs of interestand secondly of identifying the essential characteristics of theproducer enzyme for said GOSs of interest in order, if appropriate, toimprove the performance of the products of the enzymatic reaction as afunction of the envisaged use.

The technical problem underlying the present invention is thus toprovide an enzyme and hence isolated nucleic acids encoding said enzymeto allow the improved production of GOS having α(1→2) linkages.

The present invention provides a technical solution to the variousquestions mentioned above by providing a novel dextransucrase, termedDSR-E, encoded by a gene endowed with a novel and unexpected structure(dsrE) capable of catalyzing the synthesis of glucans oroligosaccharides containing α(1→2) linkages.

Between the date of filing of the priority document, French patentnumber 0103631 in which the dextransucrase of the invention was termedDSR-D, and that of the present application, another dextransucrase,different from the enzyme of the invention, was described and alsotermed DSR-D. For this reason, in the present patent application, thedextransucrase described, claimed and shown in FIG. 1 b) is no longerdesignated DSR-D as in the priority document, but is termed DSR-E. Infact, the DSR-D dextransucrases in said priority document and DSR-E arecompletely identical.

The term “novel and unexpected structure” means that the organization ofthe protein differs from that of all other glycosyltransferasesdescribed until now (1) with a catalytic domain located upstream of aglucan binding domain, the latter constituting the carboxylic portion ofthe protein.

The present invention thus concerns an isolated polypeptide having anenzymatic glycosyltransferase activity capable of forming dextranshaving α(1→2) linkages, characterized in that it comprises at least oneglucan binding domain and a catalytic activity domain located downstreamof the glucan binding domain. The term “located downstream” means thefact that the amine portion of the sequence with catalytic activity orcatalytic domain is proximal to the carboxylic portion of the glucanbinding domain. These two domains can be immediately contiguous or, incontrast, they may be separated by a variable domain.

The glycosyltransferase of the invention preferably comprises a signalpeptide.

In one implementation of the invention, the glycosyltransferasecomprises two catalytic domains located either side of the glucanbinding domain.

The presence of a domain with catalytic activity in the carboxylicportion of the enzyme is an essential characteristic of the latter inits capacity to form osidic α(1→2) bonds. In fact, as will be shown inthe experiments described below, deletion of this domain in an enzymehaving at least two catalytic domains results in the production ofglucans or oligosides essentially having α(1→6) type osidic bonds andfree of α(1→2) type bonds.

More precisely, the catalytic domain, as long as it is locateddownstream of a glucan binding domain, allows the synthesis ofoligosides containing α(1→2) bonds.

Further, the experiments described below demonstrate that thespecificity of the dextransucrase DSR-E function, namely its capacity tocatalyze the formation of α(1→2) osidic bonds, can be attributed not tothe concomitant presence of two catalytic domains but rather to theconcatenation of a glucan binding domain and a catalytic domain, andmore particularly the CD2 catalytic domain.

A comparative analysis of the different glycosyltransferases includingdextransucrases has demonstrated a very high degree of conservation oftheir catalytic domain.

The catalytic domain located in the carboxy-terminal portion of theglycosyltransferase of the invention has a sequence having at least 44%identity and 55% similarity with the catalytic domains of the otheranalyzed glycosyltransferases. In particular, the catalytic domain inthe carboxylic portion of the glycosyltransferase of the invention hasat least 65% identity and at least 80% similarity with the SEQ ID No: 1,the catalytic triad Asp/Glu/Asp in respective positions 230/268/342being conserved.

Throughout the text, the term “X %” similarity” with respect to areference sequence means that X % of the amino acids are identical ormodified by conservative substitution as defined in the ClustalW aminoacid alignment software(http:///bioweb.pasteur.fr/docs/doc-gensoft/clustalw//) and that(100-X)% can be deleted, substituted by other amino acids, or that(100-X)% can be added to the reference sequence. A particular primarystructure of the enzyme of the invention is shown in SEQ ID No: 2 whichrepresent a sequence of 2835 amino acids of a dextransucrase of L.mesenteroides NRRL B-1299.

This dextransucrase, denoted DSR-E, like most glycosyltransferases anddextransucrases, has a signal sequence, a variable domain of lowconservation, a highly conserved catalytic domain (CD1), a glucanbinding domain (GBD) and a second catalytic domain (CD2) in thecarboxylic portion of the protein. DSR-E is the firstglycosyltransferase analyzed and has two catalytic domains, in theconfiguration shown in FIG. 1 b). It is also the firstglycosyltransferase the catalytic domain of which is located in thecarboxylic portion of the protein.

FIG. 1 b) also shows that the glucan binding domain is substantiallylonger than that described above for known dextransucrases; thus, afurther characteristic of the enzymes of the invention is the size ofthis domain which is over 500 amino acids.

A comparison and analysis of the DSR-E sequence with the sequences ofthe glycosyltransferases or dextransucrases that have already beendescribed (1), and the means used to this end are indicated in Example 2detailed below. It clearly shows that while the existence of twocatalytic domains substantially differentiates DSR-E from other enzymes,in contrast the sequences of said domains are substantially conserved.In particular, the amino acids necessary for catalytic activity areconserved in the second catalytic domain, namely the triad Asp/Glu/Asplocated in respective positions 2210/2248/2322 of SEQ ID No: 2.

Thus, the invention also concerns any isolated polypeptide having acatalytic glycosyltransferase activity that is capable of formingdextrans or oligosaccharides having α(1→2) linkages as obtained bymodification, substitution, insertion or deletion of amino acidsequences but comprising sequences having at least 80% and preferably atleast 90% similarity with the following sequences of SEQ ID No: 2:

423-439 2120-2138 478-501 2161-2184 519-539 2202-2214 560-571 2243-2250631-645 2315-2322 1014-1021 2689-2696

Preferably, finally, a polypeptide with catalytic activity of theinvention contains the following amino acids:

W in positions 425 and 2122;

E in positions 430, 565 and 2127, 2248;

D in positions 487, 489, 527, 638, 2170, 2172, 2210 and 2322;

H in position 637 and 2321;

Q in position 1019 and 2694.

The polypeptides with glycosyltransferase activity that can form osidicα(1→2) bonds can be in the isolated form or, in contrast, integratedinto a larger protein such as a fusion protein. It may be advantageousto include sequences having another function, such as a specific tagsequence of a ligand that can facilitate purification. These tagsequences can be of the following types: GST(glutathione-S-transferase), intein-CBD (chitin-binding domain) (sold byNew England Biolabs, http://www.neb.com), MBD (maltose binding domain),polypeptides containing contiguous histidine residues that canfacilitate purification of the polypeptide with which it is fused. Theskilled person could design any other fusion protein that couldassociate the function of the DSR-E of the invention with anotherfunction, a non limiting example being a sequence increasing thestability of the enzyme produced by expression in a recombinant host ora sequence that can increase the specificity or efficacy of action ofsaid enzyme, or a sequence aimed at associating another connectedenzymatic activity.

Such fusion proteins also fall within the scope of the inventionprovided that they contain the CD2 domain of the glucan binding site. Inthe same manner, fragments of SEQ ID No: 2, comprising at least SEQ IDNo: 1 and the glucan binding domain, alone or integrated into a largerpolypeptide forms part of the invention, as long as the enzymaticactivity of the dextransucrase is conserved.

Variations of the polypeptide sequences defined above also form part ofthe invention. In addition to the polypeptides obtained by conservativesubstitution of the amino acids defined above, the variations includepolypeptides the enzymatic activity of which is improved, for example bydirected or random mutagenesis, by DNA shuffling, or by duplication ofthe CD2 catalytic domain.

The particular structure of this enzyme identified in the presentinvention results from a process comprising:

-   -   a) identifying and isolating dextransucrase from L mesenteroides        catalyzing the production of GOSs of interest carrying α(1→2)        linkages;    -   b) sequencing the enzyme fragments;    -   c) synthesizing amplification primers that can amplify the gene        corresponding to the producing strain or fragments thereof;    -   d) sequencing the amplified fragments;    -   e) cloning in specific vectors and their expression in        appropriate hosts.

The features of the method employed are given in detail in theexperimental section below. The first step consists of separating theproteins by polyacrylamide gel electrophoresis and identifying bandshaving a dextransucrase activity by an in situ enzymatic reaction in thepresence of substrate and acceptor. The nature of the GOSs synthesizedis then identified for each band by HPLC analysis using the methodsdescribed in (1). The retention time for the oligosides in HPLC dependson the nature and organization of their osidic bonds. In particular, itis possible to distinguish between those constituted by residues havingα(1→6) bonds, having α(1→6) bonds with a α(1→2) linkage at thenonreducing end of the molecule, and the desired compounds having alinear α(1→6) chain with α(1→2) linkages.

The inventors therefore isolated and identified dextransucrase from L.mesenteroides NRRL B-1299 producing GOSs of interest.

A reverse engineering process carried out in steps b) to e) above thenprovide the nucleotide sequence encoding the enzyme, allowing industrialscale production and, if appropriate, allowing it to be modified,improving its performance using techniques that are available to theskilled person. As an example, directed or random mutagenesis or DNAshuffling can be cited (3).

The invention also pertains to an isolated nucleic acid moleculeencoding an enzyme with glycosyltransferase activity that can formdextrans or oligosides having α(1→2) linkages and comprising at leastone sequence encoding a glucan binding domain, and at least onenucleotide sequence encoding a catalytic domain located on the 3′ sideof the foregoing, said sequence encoding a catalytic domain having atleast 50% and preferably at least 70% similarity with SEQ ID No: 3.

The term “similarity” means that for the same reading frame, a giventriplet is translated by the same amino acid. Thus, this term includesmodifications to bases resulting in degeneracy of the genetic code.

The percentage similarity is determined by comparing a given sequencewith the reference sequence. When they have different lengths, thepercentage similarity is based on the percentage of nucleotides in theshortest sequence which are similar to those in the longest sequence.

The degree of similarity can be conventionally determined using softwaresuch as ClustalW (Thompson et al, Nucleic Acid Research (1994), 22:4673-4680) distributed by Julie Thompson (Thompson@EMBL-Heidelberg.de)and Toby Gibson (Gibson@EMBL-Heidelberg.de) at the European MolecularBiology Laboratory, Meyerhofstrasse 1, D-69117, Heidelberg, Germany.ClustalW can also be downloaded from a number of websites includingIGBMC (Institut de Génétique et de Biologie Moléculaire et Cellulaire, BP 163, 67404 Illkirch Cedex, France; ftp://ftp-igbmc.u-strabg.fr/pub/)and EBI (ftp://ftp.ebi/ac.uk/pub/software/) and all sites linking to theBioinformatics Institute (Wellcome Trust Genome Campus, Hinxton,Cambridge, CB10 1SD, UK).

The isolated nucleic acids of the invention can in particular compriseother sequences intended to improve the expression and/or activity ofthe enzyme produced.

As an example, they can be:

-   -   sequences encoding a signal sequence for their secretion;    -   duplication of the sequence encoding the CD2 catalytic domain.

Preferably, an isolated nucleic acid of the invention comprises:

-   -   a) two sequences encoding catalytic domains having at least 50%,        preferably at least 80% similarity with SEQ ID No: 3;    -   b) a sequence enclosing the glucan binding domain, the latter        preferably being located between the two sequences in a).

A nucleic acid of the invention can also comprise:

-   -   a promoter suitable for its expression in a selected host cell;    -   a sequence encoding a signal peptide; and/or    -   one or more variable sequences;

said sequence or sequences all being located in the 5′ portion ofsequences encoding the catalytic domain(s).

A more particular example of an isolated nucleic acid of the inventioncomprises:

-   -   a) SEQ ID No: 4;    -   b) a sequence having at least 80% similarity with SEQ ID No: 4;        or    -   c) the complementary strand to sequence a) or b); or    -   d) a sequence hybridizing a), b) or c).

The hybridization in d) is carried out under standard conditions, andpreferably under stringent conditions. The term “hybridization understringent conditions” means that there is at least 80% sequence identitywith the sequence which is to be hybridized, preferably an identity ofat least 90% of the sequence which is to be hybridized, under conditionswhich are, for example, described in Sambrook and Russel (3^(rd)edition, 2001, Cold Spring Harbor Laboratory Press, Cold Spring Harbor,N.Y.).

The invention also concerns a gene encoding a dextransucrase that canform at least 15% α(1→2) linkages. In addition to the encoding sequence,the gene comprises sequences that allow transcription initiation andsequences that allow attachment of messenger RNA to the ribosome (RBS).SEQ ID No: 5 represents a gene structure as isolated from L.mesenteroides NRRL B-1299.

The nucleotides upstream of the translation initiation ATG are numbered1 to 232.

The existence of a RBS sequence can be identified between nucleotides218 and 223 as well as the consensus sequences −35 and −10 locatedbetween nucleotides 82 and 86 (TTGAA) on the one hand and 100 and 105(ATAAAT) on the other hand.

Any nucleic acid sequence that can be hybridized with DNA of SEQ ID No:4 or its complementary strand is capable of encoding an enzyme havingthe properties and characteristics of the enzyme of the invention. Thisapplies to natural sequences existing in micro-organisms other than L.mesenteroides NRRL B-1299 and isolated from gene libraries ofmicro-organisms, and to those prepared by genetic engineering or bychemical synthesis.

In particular, the sequences upstream of the translation initiation ATGand necessary for expression of the protein can advantageously besubstituted by transcription initiation and/or ribosome bindingsequences suitable for the expression system selected for the codingsequence.

A nucleic acid sequence that is capable of hybridizing under stringentconditions with the isolated nucleic acid of the invention alsocomprises fragments, derivatives or allele variations of the nucleicacid sequence of the invention which encodes a protein having theenzymatic activity described above. Thus, the fragments are defined asfragments of nucleic acid molecules that are sufficiently long to encodea protein that has conserved its enzymatic activity. This alsoencompasses fragments that are free of the sequence encoding the signalpeptide responsible for protein secretion.

The term “derivative” means a sequence that is different from theoriginal sequence in one or more positions but which has a high degreeof similarity with said sequences. In this context, “similarity” meansat least 80% identity of the nucleotides, preferably at least 90%identity with the original sequence. The modifications in this case aredeletions, substitutions, insertions or recombinations provided that theenzyme encoded by these homologous sequences has the enzymatic activityof the polypeptides of the invention.

The nucleic acid sequences of the invention as described above andqualified by derivatives of said molecules as defined above aregenerally variations exerting the same biological function. Saidvariations can be natural variations, in particular those observed fromone species to another and resulting in interspecies variability or, incontrast, those introduced via directed or random mutagenesis or by DNAshuffling.

Similarly, the invention encompasses isolated nucleic acids encoding aglycosyltransferase that can catalyze the synthesis of dextran oroligosaccharide carrying at least 20% and preferably at least 30% typeα(1→2) linkages obtained by DNA shuffling and comprising:

-   -   a step for random modification of one of the sequences defined        above and in particular SEQ ID Nos: 3 and 4 and establishing the        variations;    -   a step for expressing a host housing a variation from said        modified sequences in a suitable host cell;    -   a step for screening hosts expressing an enzyme that can form        more than 20% and preferably more than 30% α(1→2) bonds on a        suitable substrate and a step for isolating the improved gene or        genes.

An isolated nucleic acid of the invention can also comprise:

-   -   a) a sequence containing at least 80% similarity with the        sequence encoding a dextransucrase expressed by the plasmid        pCR-T7-dsrE in E. coli deposited at the CNCM on 15^(th) Mar.        2001 with accession number I-2649 (E. coli JM 109        [pCR-T7-dsrD]), or    -   b) a complementary sequence of the sequence in a).

The denomination of the strain transformed by the recombinant plasmidpCR-T7-dsrE deposited at the CNCM is that indicated above in brackets.This does not affect the change in the denomination of the gene carriedout following deposition of said strain for the reasons given above.

The invention also concerns nucleic acid fragments as defined above,which are hybridizable with SEQ ID No: 4 and can be used ashybridization probes for detecting sequences encoding the enzymes of theinvention. Said fragments can be prepared using any technique known tothe skilled person.

In addition to hybridization probes, amplification primers also formpart of the invention. Said primers are fragments which are hybridizablewith SEQ ID No: 4 or with its complementary strand and which allowamplification of specific sequences encoding dextransucrases present ina prokaryotic or eukaryotic animal or plant organism.

The use of said amplification primers allows the use of a method foridentifying the possible existence of a gene encoding an enzyme that cancatalyze synthesis of GOS with α(1→2) linkages in said organism, saidmethod also forming part of the invention.

The invention also concerns expression vectors comprising a nucleic acidas described above under the control of a sequence allowing itsexpression and preferably its excretion in prokaryotic or eukaryoticcells. The term “prokaryotic cells” preferably denotes bacteria selectedfrom a group comprising E. coli, Lactococcus, Bacillus and Leuconostoc.The term “eukaryotic cells” preferably means eukaryotes selected from agroup containing yeasts, fungi and plants.

The vector comprises a promoter suitable for expression of the isolatednucleic acid of the invention in the selected expression system. As anexample, the T7 bacteriophage promoter could advantageously be selectedfor expression in E. coli.

The invention also concerns host cells, prokaryotic or eukaryotic,transformed by a nucleic acid of the invention, preferably comprised inan expression vector carrying a promoter, adapted for expression in theselected host cells. The transformed cells are selected from Gram−bacteria such as E. coli, or from Gram+ bacteria such as Lactococcus,Bacillus, Leuconostoc, or from eukaryotes in a group comprising yeasts,fungi and plants.

One particular example of a cell transformed in accordance with theinvention is the E. coli strain harboring a plasmid termed pCR-T7-dsrEcarrying the SEQ ID No: 4 under the control of the T7 bacteriophagepromoter deposited at the CNCM on 15^(th) Mar. 2001 under accessionnumber I-2649.

The present invention also concerns a method for producing aglycosyltransferase that can form dextrans or oligosides having at least15% and preferably at least 20% of type α(1→2) osidic linkages andcomprising:

-   -   a) inserting a nucleic acid or a vector as defined above into a        host cell that can express and preferably secrete the        glycosyltransferase;    -   b) characterizing the enzymatic activity being investigated        using any of the methods accessible to the skilled person;    -   c) purifying the enzyme from a cell extract.

The term “method for characterizing enzymatic activity known to theskilled person” means the methods described in the literature, forexample in reference (2), and novel methods that may be developed toallow identification and discrimination of glucooligosaccharides havingthe desired degree of linkages.

In fact, it concerns any screening method that can identify the presenceof α(1→2) linkages in a GOS.

Examples are:

-   -   HPLC in which GOS migration varies as a function of the nature        and position of the linkages, in particular those containing the        α(1→2) bond at the reducing end and those containing this bond        on the penultimate glucose; and/or    -   nuclear magnetic resonance (NMR);    -   the existence of a positive reaction with specific monoclonal        antibodies of α(1→2) bonds on the reducing end and/or specific        monoclonal antibodies of α(1→2) bonds on the penultimate glucose        of the GOS.

The invention also concerns a method for obtaining a glycosyltransferasethat can have oligosides or dextrans having a percentage of α(1→2)linkages of more than 15% and preferably more than 30% of the totalityof the osidic bonds and comprising a step for modifying SEQ ID No: 4 byaddition, deletion or mutation provided that:

-   -   the reading frame is not modified; and    -   the following amino acids are conserved after translation:    -   W in positions 425 or 2122, encoded by the TGG triplet in        positions 1273 and 6364;    -   E in positions 430, 565, 2127 and 2248, encoded by GAA triplets        in positions 1288, 1693, 6379 and 6742 respectively;    -   D in positions 487, 489, 527, 638, 2170 and 2210, encoded by GAT        triplets in positions 1459, 1465, 1579, 1912, 6508 and 6628        respectively;    -   D in positions 2172 and 2322 encoded by GAT triplets in        positions 6514 and 6964;    -   H in position 637 and 2321, respectively encoded by the CAT        triplet in position 1909 and CAC in position 6961;    -   Q in positions 1019 and 2694, respectively encoded by triplets        CAA (position 3055) and CAG (position 8080).

A method for producing a glycosyltransferase according to the inventionhaving the same characteristics as above can also comprise:

-   -   a step for randomly modifying SEQ ID No: 4 and establishing a        library of variations;    -   a step for expressing a host housing a variation from said        modified sequences in a suitable host cell;    -   a step for screening hosts expressing an enzyme that can form        more than 15% and preferably more than 30% of α(1→2) bonds on a        suitable substrate;    -   and a step for isolating the improved gene or genes.

In a further implementation of the invention, the method consists ofmodifying SEQ ID No: 3 by duplicating all or part of the CD2 catalyticdomain.

It should be understood that the methods above are not only aimed atobtaining a glycosyltransferase that can form oligosides having aconstant and reproducible percentage of α(1→2) linkages of more than 15%of the total linkages, but also to improve the degree of α(1→2) linkageswith the aim of modifying the properties of the oligosides obtained toimprove their dietetic properties or their capacity to maintain orre-establish bacterial flora associated with certain organs of the humanor animal body.

Finally, the present invention concerns glycosyltransferases that can beobtained by a method as defined above and which can form at least 15%and preferably at least 30% of type α(1→2) osidic linkages inglucooligosaccharides.

Finally, the invention pertains to the use of glycosyltransferases ofthe invention as well as those that can be obtained by the methodsmentioned above, in the production of a composition with a pre-bioticeffect or in the manufacture of a dermatological, cosmetic orpharmaceutical composition.

Non-limiting examples that can be cited are the improvement inintestinal transit in animals and in man, an improvement in calciumand/or magnesium assimilation and of minerals in general, preventingcancer of the colon and prevention or treatment of skin affections suchas acne, dandruff or body odor.

The advantage of the polypeptides and nucleic acids encoding saidpolypeptides of the invention is not only in improvements in terms ofquality, yield, reproducibility and cost of glycosyltransferases thatcan form oligosaccharides with type α(1→2) osidic linkages, but also inproducing novel enzymes the functionality of which is improved.

The figures, examples and detailed description below providenon-limiting illustrations of the particular characteristics andfunctionalities of polypeptides with enzymatic activity and sequencesencoding them. In particular, they can illustrate more precisely thespecificity of the catalytic domain present in the carboxylic portion ofthe enzyme of the invention and its potential evolution to obtainimproved enzymes.

KEY TO FIGURES

FIG. 1: Structure of native glycosyltransferases and derived recombinantproteins: FIG. 1 a) shows the structure of glycosyltransferases anddextransucrases described in the literature (1). PS: signal peptide; ZV:variable zone; CD: catalytic domain; GBD: glucan binding domain. FIG. 1b) shows the structure of the glycosyltransferase of the invention.FIGS. 1 c) to 1 i) show different constructions comprising deletions incomparison with native DSR-E protein. Δ(PS) corresponds to the controlconstituted by the entire form cloned into the pBAD-TOPO thiofusionsystem (Invitrogen).

FIG. 2: Diagrammatic summary of the method for cloning the nucleotidesequence encoding a glycosyltransferase of the invention using a genomelibrary by using a PCR probe described in Table I and a HindIII/EcoRVprobe respectively.

FIG. 3: Comparison of the signal sequences of differentglycosyltransferases of L. mesenteroides (residues 1-40 of SEQ ID NO:2). The conserved amino acids are shown in bold. DSR-B: L. mesenteroidesNRRLB-1299 (4) (SEQ ID NO: 45); DSR-S: L. mesenteroides NRRLB-512F(5)(SEQ ID NO: 46); ASR: L. mesenteroides NRRL B-1355 (6) (SEQ ID NO: 47).

FIG. 4: Alignment of 11 repeat sequences (SEQ ID NOS: 50-61) of theDSR-E enzyme and observed in the variable zone.

FIG. 5: Alignment of conserved sequences in the catalytic domain (SEQ IDNOS: 6-17 and 62-103

-   -   Block A: essential amino acids of the N-terminal portion of the        catalytic domain;    -   Block B: amino acids of the catalytic saccharose binding domain;    -   Blocks C, D, E: blocks containing three amino acid residues        involved in the catalytic triad (6);    -   Block F: sequence containing glutamine 937 of GTF-1 studied by        Monchois et al (7).

The entirely conserved amino acids are indicated in bold. “*”:conservative substitutions; “:”: semi-conservative substitutions; - - -: gap. The numbering is that for SEQ ID No: 2.

FIG. 6: HPLC characterization of products synthesized by recombinantenzyme DSR-E.

6A: HPLC analysis of glucooligosaccharides obtained with dextransucrasesof L. mesenteroides NRRL B-1299.

6B: HPLC analysis of glucooligosaccharides obtained by recombinantDSR-E. The following peaks are identified:

1: fructose

2: maltose;

3: sucrose;

4: panose;

5: R4;

6: OD4;

7: R5;

8: OD5;

A, B, C: unidentified peaks.

6C: recombinant DSR-E deleted from the catalytic domain of thecarboxylic portion of the enzyme (ΔDSR-E).

FIG. 7: HPLC analysis of acceptor on maltose reaction productssynthesized by different entire forms and deleted from the DSR-Eprotein.

L.m. B-1299: mixture of dextransucrases produced by L. mesenteroidesNRRL B-1299.

The peaks were identified as follows:

F: fructose;

M: maltose;

S: saccharose

P: panose;

R4, R5: GOS comprising α(1→2) bonds;

OD4, OD5: GOS free of α(1→2) bonds.

MATERIALS AND METHODS

1) Bacterial Strains, Plasmids and Growth Conditions:

All strains were kept at −80° C. in tubes containing 15% glycerol (v/v).

Leuconostoc mesenteroides B-1299 (NRRL, Peoria, USA) was cultivated at27° C. with stirring (200 rpm) on standard medium (saccharose 40 g/l,potassium phosphate 20 g/l, yeast extract 20 g/l, MgSO₄,7H₂O 0.2 g/l,MnSO₄,H₂O 0.01 g/l, NaCl 0.01 g/l, CaCl₂ 0.02 g/l, FeSO₄,7H₂O 0.01 g/l),the pH being adjusted to 6.9.

Escherichia coli DH5α and JM109 were cultivated on LB medium(Luria-Bertani).

Selection of pUC18 or pGEM-T Easy recombinant clones was carried out onLB-agar dishes supplemented with 100 μg/ml of ampicillin, 0.5 mM ofisopropyl-β-D-thiogalactopyranoside (IPTG) and 40 μg/ml of5-bromo-4-chloro-3-indolyl-β-D-galactopyranoside (X-gal). E. coli TOP 10cells were used to clone the PCR TOPO Cloning (Invitrogen) product andcultivated on LB medium supplemented with kanamycin in a concentrationof 50 μg/ml.

Regarding expression of dsrE, the ECHO Cloning System cloning kit(Invitrogen) allows a PCR product to be cloned in a donor vector(pUNI/V5-His-TOPO), preceding a step for recombination with a suitableacceptor vector (pCR-T7-E). This system requires E. coli PYR1, TOP 10and PL21(DE3)pLysS cells cultivated on LB medium supplemented with 50μg/ml of kanamycin as well as 34 μg/ml of chloramphenicol for theBL21(DE3)pLysS strain.

Digested and dephosphorylated pUC18 plasmids from Pharmacia (AmershamPharmacia Biotech) were used to constitute the genomic DNA library of L.mesenteroides NRRL B-1299. Cloning of the PCR product necessitated theuse of the pGEM-T Easy plasmid (Promega) and TOPO-XL plasmid(Invitrogen) for fragments of more than 2 kbp.

The pBAD-TOPO Thiofusion system (Invitrogen) used to construct thedifferent deleted forms of the DSR-E protein used the araBAD promoterthe control mechanisms for which involve the AraC regulatory protein. Inthe absence of an inducer, namely L-arabinose, the dimeric AraC proteinassociates with the regulatory structures of the operon and entrains theformation of a DNA loop, said loop then blocking transcription of genesplaced under the control of the araBAD promoter. In the presence ofL-arabinose, in contrast, AraC forms a complex which liberates the DNAloop and allows transcription initiation. The base expression can belimited by adding glucose to the culture medium: this reduces the levelof cyclic AMP and thus concomitant activation of the CAP protein (cAMPactivator protein). The level of activation obtained is a function ofthe concentration of L-arabinose so that the optimum conditions forproduction of the protein of interest can be selected with accuracy.

Further, the use of this vector can allow a 12 kDa thioredoxin tag to bepositioned on the N-terminal end of the protein of interest. This fusionencourages the translation of the gene encoding said protein ofinterest. The tag protein can also enhance the solubility of the proteinto which it is fused. The pBAD-TOPO Thiofusion system is designed toallow ready elimination of the thioredoxin tag by simple cleavage usingenterokinase. Finally, using this expression system, a histidine tag isinserted on the C-terminal end side of the protein of interest. Such atag is used to purify said protein by affinity.

Within the context of using this system, the E. coli TOP 10 strain wascultivated on LB medium supplemented with 100 μg/ml of ampicillin.

2) Gel Electrophoresis, Location and Characterization of Enzyme:

After culturing L. mesenteroides NRRL B-1299 for 7 h, the medium wascentrifuged (7000 rpm, 4° C., 30 min) and the cells, in which 90% of theenzymatic activity was found, were concentrated 10 times in an acetatebuffer solution (20 mM, pH 5.4), heated for 5 minutes at 95° C. in thepresence of denaturing solution (tris HCl 62.5 mM, SDS 4%, urea 6M,bromophenol blue 0.01% and β-mercaptoethanol 200 mM). 300 μl of themixture was deposited on 7% polyacrylamide gel. After migration, thetotal proteins were revealed by amido black staining, while thedextransucrase activity was detected by staining with Schiff's reagentpolymer after synthesizing the dextran in situ. The bands correspondingto the active dextransucrases were excised and incubated separately in 2ml of 20 mM sodium acetate solution, pH 5.4, containing 100 g/l ofsaccharose and 50 g/l of maltose. After total consumption of saccharose,the reaction was stopped by heating to 95° C. for 5 minutes, and thereaction medium was centrifuged for 5 minutes at 15000 g to eliminatethe insoluble dextran. The samples were analyzed by reverse phasechromatography (C18 column, Ultrasep 100, 6 μm, 5×300 mm, BishoffChromatography) using ultrapure water as the eluent, at a constant flowrate of 0.5 ml/min. The oligosaccharides were separated for 30 minutesat ambient temperature and detected by refractometry. Peptide sequencingwas carried out on the selected protein bands by the Laboratoire deMicroséquençage, Institut Pasteur, Paris.

3) Molecular Biological Techniques Used

Purification of the E. coli plasmid and purification of the genomic DNAof L. mesenteroides was carried out using the QiaPrep Spin Plasmid kitand the Cell Culture DNA maxi kit (Qiagen) respectively. Theamplification and cloning methods were carried out using standardtechniques (Sambrook and Russel, 2001, supra). Restriction andmodification enzymes from New England Biolabs or Gibco BRL were used inaccordance with the manufacturer's instructions.

PCR was carried out with primers selected on the basis of the proteinsequence obtained on an isolated band from gel electrophoresis (seesupra, gel electrophoresis and enzyme localization). Two peptides wereselected:

-   -   29-FYFESGK (SEQ ID NO: 18); and    -   24-FESQNNNP (SEQ ID NO: 19)        and used to synthesis degenerate oligonucleotides indicated in        Table I below.

In this table, the numbering of which follows that of SEQ ID No: 4, itappears that the presence of a serine residue in the two peptidesnecessitates the synthesis of two primers for each peptide since serinecan be encoded by six different codons. ECHO-dir and ECHO-inv areprimers which allowed amplification of dsrE by PCR for cloning into theECHO Cloning (Invitrogen) expression system.

TABLE 1 SEQ ID NOS: 18-27 Designation Description Sequence 5′-3′ 29-dir1FYFESGK TT(C/T)TA(C/T)TT(C/T)GA(A/G)TCAGG(C/G)AA(A/G) 29-dir2TT(C/T)TA(C/T)TT(C/T)GA(A/G)AGCGG(C/G)AA(A/G) 24-inv1 FESQNNNP(T/G)GG(G/A)TT(G/A)TT(G/A)TTTTGTGA(T/C)TCAAA 24-inv2(T/G)GG(G/A)TT(G/A)TT(G/A)TTTTGGCT(T/C)TCAAA IPCR-rev sequence ntCCCTTTACAAGCTGATTTTGCTTATCTGCG 5769-5798 IPCR-dir sequence ntGGGTCAAATCCTTACTATACATTGTCACACGG 8311-8342 ECHO-dir sequence nt-AGTTGTATGAGAGACATGAGGGTAATTTGTGACCGTAAAAAATTG   6-39 ECHO-inv sequencent ATTTGAGGTAATGTTGATTTATCACCATCAAGCTTGAAATATTGACC 8457-8504PCR

PCR was carried out using a Perkin-Elmer thermocycler, model 2400, with50 nanograms of genomic DNA. The quantities of primers used was 10 μM of29-Dir-1 and of 24-Inv1. 250 μM of each triphosphate deoxynucleotide andTaq polymerase were added to the reaction mixture.

After amplification of 25 cycles at 94° C. for 30 seconds then at 50° C.for 30 seconds, then at 72° C. for 5 minutes, a 666 base pair fragmentwas obtained.

Certain fragments were amplified using the “Expand Long Template PCR”(Roche Boehringer Mannheim) system, in accordance with themanufacturer's instructions. This system can amplify large fragments ofup to about 20 kbp highly effectively. The combination of two DNApolymerases can minimize errors during the elongation phases.

Southern Hybridization and Gene Library of L. mesenteroides NRRL B-1299

Chromosomal DNA from L. mesenteroides NRRL B-1299 was digested withdifferent restriction enzymes then separated by electrophoresis on 0.8%agarose gel in TAE 0.5× buffer.

Genomic libraries of the bacteria were transferred onto nylon hybond N+membranes (Amersham PharmiciaBiotech). Hybridization was carried outusing the 666 base pair fragment of deoxy-adenosine-triphosphate labeledwith ³²P. The labeling reaction was carried out using the “Mega PrimeDNA Labelling System Kit” (Amersham PharmaciaBiotech) labeling kit,followed by purification of the probe on MicroSpin S-200HR columns.Pre-hybridization and hybridization was carried out under highlystringent conditions (65° C. overnight using the normal methods)(Sambrook and Russel, 2001, supra).

Reverse PCR

The reverse PCR reaction produces a linear DNA fragment from a circularmatrix using divergent primers.

Genomic DNA from L. mesenteroides NRRL B-1299 was digested with EcoRVunder the conditions recommended by the manufacturer.

After re-circularization, the digestion products were used as a matrixin a reverse PCR reaction [Extrapol II DNA polymerase (Eurobio),reaction volume of 50 μl, reverse PCR reaction parameters: 25 cycles;94° C.; 30 seconds; 51° C., 30 seconds; 72° C., 3 minutes]. The twoprimers were selected as a function of the pSB2 insert sequence asindicated in FIG. 2.

FIG. 2 summarizes the conditions for obtaining different plasmidscarrying dsrE fragments by screening the gene library and using theprobes described above.

DNA Sequence and Analysis

After sequencing the peptides, degenerate primers marked out because ofthe frequency of use of codons in the dextransucrase genes of L.mesenteroides NRRL B-1299 were synthesized and allowed amplification ofa 666 bp fragment. Sequencing this fragment revealed strong homologieswith the genes of known dextransucrases, even though it was entirelynovel.

The use of this fragment as a homologous probe in Southern experimentsallowed positive signals on different tracks of genomic DNA to bemarked. A first HindIII library was then screened and a recombinantplasmid termed pSB2 containing a 5.6 kbp insert was purified. Ananalysis of the sequence for this HindIII fragment revealed an openreading frame covering the whole insert. Then a EcoRV library wasscreened with a HindIII/EcoRV probe isolated at the N-terminal end ofthe 5.6 kbp HindIII insert. A recombinant pSB3 recombinant plasmid,tested positively by dot-blot, was shown to contain a 3.8 kbp insertwhich, after sequencing, was shown to contain the initiation codon fortranslation and the promoter region of the novel dextransucrase genetermed dsrE.

With the aim of obtaining the dsrE termination codon, reverse PCR wascarried out on genomic DNA from L. mesenteroides NRRL B-1299 digestedwith EcoRV and re-ligated to itself, using divergent oligonucleotideprimers designated from the pSB2 insert sequence. A single fragment withthe expected size of 1 kbp was amplified then cloned in pGEM-T Easy toobtain the pSB4 plasmid. After sequencing, the amplified sequencelocated downstream of the HindIII site comprised 221 bp and containedthe reading frame termination codon for dsrE located 30 bp downstream ofthe HindIII restriction site.

Sequencing of the different fragments carried by the three plasmids wascarried out on both strands by the company Genome Express. Sequenceanalyses of the nucleotides was carried out using “ORF Finder”(http://www.ncbi.nlm.nih.gov/gorf/gorf.html), Blast(http://www.ncbi.nlm.nih.gov/blast/blast.cqi, Altschul et al, 1997),ClustalW (http://www2.ebi.ac.uk/clustalw, Thompson et al, 1994), PRODOM(http://protein/tolulouse.inra.fr/prodom.html, Corpet et al, 2000), PFAM(http://pfam.wustl.edu.hmmsearch.shtml, Bateman et al, 2000) and SAPS(http://bioweb.pasteur.fr/segana/interfaces/saps.html, Brendel et al,1992), all of this software being available on the Internet.

Protein Expression

Two cloning and expression systems were used to produce recombinantproteins in E. coli, namely the ECHO-Cloning and pBAD-TOPO Thiofusion(Invitrogen) systems.

By way of example, the method for cloning the nucleotide sequenceencoding the DSR-E protein using the ECHO-Cloning system will now bebriefly described.

Two primers as proposed in Table I above were used for amplificationusing the “Expand Long Template” system under the following conditions:94° C. for 3 minutes, followed by 25 cycles at 94° C. for 30 seconds,55° C. for 30 seconds, and 68° C. for 7 minutes. The PCR products werethen cloned into the pUNI/V5-His-TOPO vector to obtain a donor vector(pUNI-dsrE) to be recombined with an acceptor vector (pCR-T7-E) andadapted for expression in E. coli. The final plasmid was designatedpCR-T7-dsrE.

This construction, placing the dsrE gene under the control of thebacteriophage promoter T7, allowed inducible expression of the dsrEgene.

After induction with 1 mM of IPTG, the transformed E. coli BL21 cellswere harvested by centrifuging after 4 hours growth and re-suspending ata final optical density of 80 at 600 nm in a 20 mM sodium acetatebuffer, pH 5.4, and 1% Triton X100 (v/v) in the presence of 1 mM of PMSFto prevent proteolysis in the cell extracts after sonication.

Similar experiments carried out with the pBAD-TOPO Thiofusion systemallowed the recombinant vector pBAD-TOPO-dsrE to be constructed.

Enzymatic Tests

The enzymatic reactions were carried out under standard conditions at30° C. in a 20 mM sodium acetate buffer, pH 5.4, NaN₃ 1 g/l andsaccharose, 100 g/l. The activity of the DSR-E enzyme was determined bymeasuring the rate at which the reducing sugars were liberated,represented here by fructose, using the dinitrosalicylic acid methodwhich is well known to the skilled person. One unit is defined as thequantity of enzyme which would catalyze the formation of 1 μmol offructose per minute under standard conditions. The oligosaccharides weresynthesized in a reaction medium containing 100 g/l of maltose, 200 g/lof saccharose and 0.5 units/ml of DSR-E.

As for the dextran synthesis, the enzymatic reaction was continued for24 hours in the presence of 100 g/l of glucose. The dextran produced wasprecipitated in the presence of 50% (v/v) ethanol and washed twice in50% ethanol (v/v) prior to freeze drying. It was then dissolved in anamount of 10 mg/ml in D₂O and analyzed by ¹³C NMR spectrometry.

HPLC Separation

100 μl samples were removed and heated at 95° C. for 5 minutes thendiluted in ultrapure water to obtain a final concentration of totalsugars of less than 5 g/l. After centrifuging, the residual substratesand the different species formed were analyzed by HPLC on a C18 column(Ultrasep 100, 6 μm, 5×300 mm, Bishoff Chromatography).

The oligosides were separated at ambient temperature for 30 minutes inultrapure water used as the eluent, at a flow rate of 0.5 ml/min.Detection was accompanied by refractometry.

These conditions allowed the following species to be separated:fructose, maltose, leucrose, saccharose, and oligosides with a degree ofpolymerization that did not exceed 6.

Calculation of Yields

The method for calculating the yields for the oligoside synthesisreactions took into account the residual concentration of the acceptorin accordance with the following formula:R={[GOS final]−[initial GOS]}/{0.474×[sacchraose consumed]+[acceptorconsumed]}

in which R represents the real yield of the total GOS synthesisreaction, the concentrations being expressed in g/l.

Construction of Different Deleted Forms of DSR-E Protein

The different deleted forms of the DSR-E protein [FIG. 1 c) to 1 i)]were obtained by PCR amplification of fragments corresponding to thedsrE gene then cloning in the pBAD-TOPO Thiofusion vector describedabove. The primers used for amplification of the regions selected fromthe dsrE gene are shown in Table II below. The positions of the primersare shown with respect to SEQ ID No: 5, relating to the sequence for thedsrE gene. The bases mutated to introduce the NcoI restriction site areshown in bold and the resulting NcoI site is underlined.

TABLE 2 SEQ ID NOS: 28-34 Desig- nation Positions Sequence 5′-3′pBAD-PS/ 344-373 G CCATGG CAAATACGATTGCAGTTGACACG ZV-dir pBAD-ZV/ 971-1001 G CCATGG ACGGTAAAACCTATTTTCTTGACG CD1-dir pBAD-CD1/ 3656-3682TCCATGGGTGAAAAAACAAGCACCGGC GBD-dir pBAD-GBD/ 6167-6189 ACCATGGATATGTCTACTAATGC CD2-dir pBAD-CD1/ 3638-3658 TAACTGTTTAGGCAAGAATCCGBD-inv pBAD-GBD/ 6146-6172 TAATGTATTAGTGAATAAGTATTCACC CD2-invpBAD-ent- 8714-8737 AATTTGAGGTAATGTTGATTTATC inv

The above direct and reverse primers were designed to ensuretranslational fusion of the N-terminal thioredoxin tag and theC-terminal polyhistidine tag of the truncated protein forms, satisfyingthe open reading frames for the regions encoding said forms.

If the pBAD-TOPO Thiofusion plasmid contains a specific restriction sitefor the NcoI enzyme located at the 5′ end of the region encodingthioredoxin, a second NcoI site is introduced into each direct primer toenable extraction of that region if required.

The PCR amplification reactions were carried out using the “Expand LongTemplate” system under the following conditions: pre-denaturing at 94°C. for 3 minutes followed by 25 cycles at 94° C. for 30 seconds, 52° C.for 30 seconds and 68° C. for 7 minutes.

The amplification products generated were then cloned into the pBAD-TOPOThiofusion vector for subsequent transformation of the E. coli TOP 10strain. Recombinant clones were selected, their restriction profileanalyzed to identify a recombinant plasmid carrying the insertionorientated as expected for each of the investigated forms.

EXAMPLE 1 Characterization and Purification of the DSR-E Enzyme andObtaining the dsrE Gene

The enzymes produced by L. mesenteroides cultures and obtained on apolyacrylamide gel in SDS as described in the Materials and Methodssection were isolated by cutting the gel.

The GOSs produced by the isolated enzymes were analyzed by HPLC usingthe methods described in (1). The enzyme the activity of which was beinginvestigated was deduced from the nature of the GOSs produced. Aftertrypsic proteolysis and separation of the peptides produced by HPLC, 2peptides: 29-FYFESGK (SEQ ID NO: 18) and 24-FESQNNNP (SEQ ID NO: 19),were sequenced and used as a model for the synthesis of degeneratenucleotide primers.

The different amplification and cloning steps are shown in FIG. 2. Thecomplete gene was inserted into the pCR-T7-E plasmid and expressed in E.coli.

The production of a functional enzyme was attested by the production ofGOSs the HPLC analysis of which is shown in FIG. 6 b).

The size of peaks 5 and 7, representing GOSs with a α(1→2) linkage,should in particular be noted.

EXAMPLE 2 Characterization of dsrE and DSR-E Sequences

2.1 Nucleotide Sequence

The nucleotide sequence of the enzyme is shown in SEQ ID No: 4. It iscomposed of a reading frame of 8506 nucleotides.

The nucleotide sequence for insertion into the pCR-T7-dsrE plasmidcontained a ribosome binding site (RBS), 9 bases upstream of the ATGinitiation codon and was composed of a hexanucleotide GAGGAA.

2.2 Analysis of Amino Acid Sequence

The 8506 nucleotide dsrE sequence encodes a 2835 amino acid proteinshown in SEQ ID No: 2. The isoelectric point for this protein is 4.88and its theoretical molecular weight is 313.2 kDa. Despite strongsimilarities with known dextransucrases, DSR-E is characterized by anoriginal structure.

Alignment of the amino acid sequence with known glycosyltransferases anddextransucrases confirmed that the structure in the glycosyltransferasedomain and dextransucrases domain was conserved, namely: a signalsequence, a variable zone, a highly conserved catalytic domain and aglucan binding domain. This structure is shown in FIG. 1 a).

As indicated in FIG. 1 b), a second catalytic domain forms thecarboxy-terminal portion of the enzyme, as confirmed by PRODOM and Blastanalysis.

With a molecular weight of 313.2 kDa, DSR-E had about twice the meanmolecular weight of other glycosyltransferases and dextransucrases (1),which is in agreement with the presence of a second catalytic domain atthe c-terminal end and also with a longer glucan binding region.

a) Analysis of Signal Sequence:

The signal sequence and the nucleotide sequence encoding the peptidesignal were highly conserved if compared with other dextransucrases, asshown in FIG. 3. The cleavage site is located between amino acids 40 and41.

b) Variable Domain:

Downstream of the signal peptide, DSR-E had a 207 amino acid variabledomain. When it was compared with other variable glycosyltransferasedomains, using a SAPS type alignment program, the presence of a 14 aminoacid motif repeated 11 times was revealed, as indicated in FIG. 4.

This alanine-, threonine- and aspartic acid-rich repeat motif has neverbefore been identified.

The role and significance of this region has never been elucidated.Different studies have shown that its deletion does not affect enzymaticactivity (4). The role of the 14 amino acid repeat motif, which does notexist in other glycosyltransferases, remains to be determined, however.

c) Analysis of Catalytic Domains:

The first catalytic domain extends from amino acids 248 to 1142 (CD1) ofSEQ ID No: 2, while the second is located between amino acids 1980 and2836 (CD2). These two domains have 45% identity and 65% similaritybetween them.

CD1 and CD2 contain amino acids already identified inglycosyltransferases and dextransucrases as being essential to theirenzymatic activity, as shown in FIG. 5.

The catalytic triads of CD1 and CD2 determined by analogy with α amylase(7) are present in the following positions:

(Asp 527/Glu 565/Asp 638 for CD1 and Asp 2210/Glu 2248/Asp 2322 forCD2).

Other conserved residues were identified as being important forenzymatic activity: the residues Trp 425/Glu 430 for CD1 and Trp2122/Glu 2127 for CD2, which are analogous to those of the N-terminaldomain of GFT1 described by Monchois et al (4): Trp 344/Glu 349.

In contrast, certain sequences located in the conserved region of theglycosyltransferases and dextransucrases are not found in the CD2 ofDSR-E. Thus, as indicated in FIG. 5 below, the sequences FIHNDT (SEQ IDNO: 35) (2214-2220) and KGVQEKV (SEQ ID NO: 36) (2323-2329) diverge fromother consensus sequences of dextransucrases already studied, which arerespectively NVDADLL (SEQ ID NO: 37) and SEVQTVI (SEQ ID NO: 38).

d) Glucan Binding Domain:

When the DSR-E sequence is compared with known sequences, it appearsthat the glucan binding region is substantially longer. In fact, thelength of this domain is about 500 amino acids in theglycosyltransferases and dextransucrases being studied while in DSR-E,it represents 836 amino acids. Several A and C repeat motifs, inparticular a series of AC repetitions, have been identified. Table IIIbelow shows the consensus sequences of the repeat motifs of GBD, inparticular the A and C motifs, described in the literature relating todextransucrases of Leucononstoc and Streptococcus spp.

TABLE 3 SEQ ID NOS: 39-43 Motif Consensus sequence AWWYFNxDGQAATGLQTIDGQTVFDDNGxQVKG BVNGKTYYFGSDGTAQTQANPKGQTFKDGSGVLRFYNLEGQYVSGSGWY C DGKIYFFDPDSGEVVKNRFVD GGVVKNADGTYSKY N YYFxAxQGxxxL x: any amino acid

EXAMPLE 3 Expression of dsrE in E. coli

E. coli BL21 (DE3) pLysS pCR-T7-dsrE cells were cultivated as describedabove. After polyacrylamide gel electrophoresis (page-SDS), analysis ofthe protein extracts effectively revealed the presence of several bandshaving saccharase dextran activity, said activity being measured asdescribed above.

The E. coli JM109 [pCR-T7-dsrD] line was deposited at the CNCM on15^(th) Mar. 2001 with accession number I-2649.

Identification and Characterization of Enzymatic Activity

Using a glucose acceptor molecule, the dextransucrases produced byrecombinant E. coli were compared with those produced by L.mesenteroides NRRL B-1299.

HPLC analysis of the reaction products with recombinant DSR-E (FIG. 6)showed retention times corresponding to the previously identified GOSsR4 and R5 (2). Type R oligosaccharides are linear GOS series, the α(1→2)bond being linked to the non-reducing end. The OD series, linear GOSsresulting from glycoside α(1→6) bonds with a maltose residue at thereducing end was observed in very small quantities. Three novelcompounds, in contrast, were detected in the recombinant enzymeproducts.

Identification of GOSs Produced:

Finally, FIG. 6 b clearly shows that peaks 5 and 7 representing the GOSsof the R series are relatively larger with the recombinant enzyme thanwith the native enzyme in which the peaks corresponding to panose andOD5 are larger.

EXAMPLE 4 Effect of Deletion of CD2 on the Enzymatic Activity of DSR-E

The genomic DNA of L. mesenteroides NRRL B-1299 was used as a matrix toamplify the dsrE gene by PCR deleted from the sequence corresponding tothe second catalytic domain. To this end, 2 oligonucleotides, ECHO-dir(5′-AGTTGTATGAGAGACATGAGGGTAATTTGTGACCGTAAAAAATTG) (SEQ ID NO: 48)corresponding to the nucleotide sequence -6 to 39 and containing thetranslation initiation codon, and ECHO-inv-del(5′-GTATTAGTGAATAAGTATTCACCATTGCATTTATCGTCAAAATAGTACG) (SEQ ID NO: 49)complementary to the sequence 5889-5937 and corresponding to the peptidesequence YYFDDKGNGEYCFTNT (SEQ ID NO: 44), were synthesized, to fuse theC-terminal end of the deleted protein with a His tag present on thecloning vector. The PCR reaction was carried out using a DNA thermalcycler model 2400 (Perkin Elmer) with the Expand Long Template System(Boehringer Mannheim) using the following temperature cycle: 94° C. for3 min, then 25 cycles with: 30 s at 94° C., 30 s at 55° C. and 7 min at68° C. The PCR product was then cloned into the pUNI donor vector andthe resulting plasmid was used in a recombination reaction with thepCR-T7-ΔdsrE expression vector.

The cell extract, preparation, enzymatic reaction and reaction productanalysis were those described in Example 3 above.

The HPLC profile of the GOSs obtained with the DSR-E enzyme deleted fromthe CD2 domain appear in FIG. 6 c).

The type R GOS shown as peaks 5 and 7 shown in FIGS. 6 a) and 6 b) areentirely absent from the products obtained with the recombinant enzymedeleted from CD2. The only analyzable products were those correspondingto linear oligosides resulting from α(1→6) bonds with a maltose residuein the reducing portion. This result clearly indicates the essentialrole of the catalytic domain located in the carboxy-terminal portion ofthe enzyme in its capacity to form α(1→2) osidic bonds.

EXAMPLE 5 Study of Structure-function Relationships of DSR-E Protein

The dsrE gene, insofar as it is the first gene encoding a dextransucrasecatalyzing the synthesis of α(1→2) bonds to have been cloned, is ofparticular interest. Thus, it is important to characterize this gene andits expression product, in this case by determining the roles of thedifferent domains making up the DSR-E protein in the function which hasbeen assigned thereto, namely to correspond to a α(1→2 specific to thesynthesis of α(1→2) bonds.

5.1 Deleted forms of DSR-E Protein:

A study of six different forms obtained by deletion of one or moredomains from the DSR-E protein was envisaged in order to determine thefollowing by reference to FIG. 1 below: (i) the influence of thepresence of the CD2 domain by studying GBD-CD2 and Δ(CD2) constructions;(ii) the influence of the presence of the variable zone by analyzing theΔ(ZV) and CD1-GBD forms; and (iii) the intrinsic catalytic potential ofthe CD1 and CD2 domains expressed in an isolated manner (CD1 and CD2constructions).

The catalytic activity of each of the different forms was compared withthat observed with the control corresponding to the entire form deletedfrom the single signal peptide Δ(PS) [FIG. 1 c)].

5.2 Analysis of Constructions:

At the end of the experimental PCR amplification and cloning proceduredetailed above, several clones with an insertion in the expectedorientation were obtained for each of the envisaged constructions, withthe exception of the truncated GBD-CD2 form for which the desiredamplification product could not be cloned.

The sequences for the insertions were determined in order to ensure theabsence of mutations that after translation may modify the amino acidslocated at positions presumed essential for the enzymatic activity ofthe protein encoded this way.

A mutation was identified at the 31^(st) insertion base relative to thecontrol Δ(PS), inducing substitution of one aspartic acid by anasparagine in position 10 of the variable zone. As it is not located inthe repeat motifs S of the variable zone (FIG. 4), it appears that theincidence of this mutation on the finally observed function isnegligible.

A mutation was introduced into the amplification product correspondingto the construction Δ(CD2), modifying the aromatic residue F1411 inleucine. This mutation was located in the first third of the glucanbinding domain GBD at a junction between two repeat motifs.

Because of the errors made by polymerase during PCR amplification, theconstruction Δ(ZV) did not have the expected sequence. In fact, theinsertion contained an open reading frame, that frame essentiallycorresponding to the GBD-CD2 form which could not be cloned. However, inthe GBD-CD2 form obtained definitively in place of Δ(ZV), 46 N-terminalresidues were absent. Now, the GBD domain has more than 800 amino acidsforming a concatenation of 24 repeat units. This concatenation is suchthat, over the 46 truncated residues, only the last 9 were located atone of said units, in particular at the first thereof. It thus appearsplausible to consider that deletion of these amino acids has noinfluence on the enzymatic reaction catalyzed by the correspondingprotein form. This hypothesis supported by the fact that in otherdextransucrases, the loss of a certain number of repeat units from theGBD domain does not significantly reduce the activity of the resultingprotein.

The insertion encoding the CD1-GBD form contained a mutation affectingthe F633 residue located in the CD1 domain and more precisely in theregion that is highly conserved in dextransucrases, itself located justin front of the second aspartic acid of the catalytic triad (FIG. 5).The expected phenylalanine was substituted by a leucine. It is difficultat this stage to estimate the impact of such a mutation on the observedcatalytic activity.

The sequence of insertions encoding the catalytic domains CD1 and CD2was determined in the same manner as for the other constructions.

5.3 Expression Products and Enzymatic Activities

The proteins corresponding to the various deleted forms of DSR-E wereexpressed by subjecting the recombinant E. coli cells to induction byL-arabinose in a concentration of 0.002%. The enzymatic activity wasobserved for the first four hours following induction.

The protein extracts obtained by sonication of the cell residues wereanalyzed by SDS-PAGE electrophoresis (Sambrook and Russel, 2001, supra).The molecular masses of the recombinant proteins were estimated from theelectrophoretic profiles obtained, said masses essentially correspondingto the expected masses taking into account the 12 kDa incrementationlinked to the thioredoxin tag. Table IV below summarizes the estimatedvalues for the molecular masses of the different truncated forms and, byway of comparison, provides the expected masses.

TABLE IV Expected mass Expected mass + Estimated mass Protein form (kDa)thioredoxin (kDa) (kDa) Δ(PS) 309 321 324 Δ(CD2) 218 230 ND GBD-CD2 224/ 233 CD1-GBD 193 205 199 CD1  99 111 111 CD2  95 107 ND ND: notdetermined

Table V below indicates the nature and position of amino acids markingthe start and end of the protein forms constructed in this study. Thedifferent positions refer to SEQ ID No: 2 corresponding to the proteinDSR-E.

TABLE V Protein form Starting amino acid Ending amino acid Total lengthΔ(PS) N41 I2835 2795 Δ(CD2) M1 L1980 1980 GBD-CD2 M1188 I2835 1648CD1-GBD I248 L1980 1733 CD1 I248 Q1141  894 CD2 D1981 I2835  855

The GBD-CD2 form did not have a thioredoxin tag. In fact, this form wasderived from experimental uncertainty occasioned by the procedure forPCR amplification of the sequence assumed to encode the Δ(ZV) form.Because of the deletions from the sequences thus generated, thethioredoxin tag, in principle situated at 5′ from the protein ofinterest, could not be fused with the GBD-CD2 region.

The quality of the electrophoresis gels did not allow determination asto whether the level of expression of the different forms wasquantitatively identical and as a result whether said forms were presentin the same proportions in the cell extracts.

The activity measurements provided were established on the basis of agiven volume of cell extracts but could not be extrapolated to thequantity of each protein actually contained in said volume of extracts.

The synthesis of dextran polymers in situ by incubating electrophoresisgels in a saccharose solution and subsequent staining with Schiff'sreagent confirmed the presence of proteins having a glucan-saccharaseactivity in cell extracts corresponding to Δ(PS), Δ(CD2), GBD-CD2 andCD1-GBD.

Table VI below shows the maximum enzymatic activities observed for eachconstruction. The results confirm the data drawn from the experiments inwhich the gels were stained with Schiff's reagent, namely the fact thatthe cell extracts relative to the forms Δ(PS), Δ(CD2), GBD-CD2 andCD1-GBD had a saccharase activity, in contrast to the two catalyticdomains taken in isolation. This result was in agreement with theliterature, given that it has been demonstrated that in otherdextransucrases, the absence of the GBD domain induced a drastic loss ofenzymatic activity (8, 9, 10).

TABLE VI Protein form Δ(PS) Δ(CD2) GBD-CD2 CD1-GBD CD1 CD2 maximum 1063181 86 235 5.3 0 activity (U/I)

The intrinsic activity of the CD1 form was too low to be detected.Regarding the GBD-CD2 form, it had a non negligible activity which leadsto the conclusion that the corresponding structural organization, namelya catalytic domain downstream of the glucan binding domain, remainsenzymatically active.

5.4 Effect of Deletions on Oligoside Synthesis:

Provided that the specificity of the synthesis of α(1→2) bonds wasconserved during the reaction in the presence of an acceptor,experiments for synthesizing oligosides starting from maltose werecarried out (FIG. 7).

When the reactions were carried out to completion, i.e. all of thesaccharose had been consumed, the oligoside synthesis yields werecalculated. The results are shown in Table VI below. Only the reactioninvolving the cell extract containing the protein form CD1 did not allowsuch a calculation. The temperature effect probably resulted ininactivation of the very low activity present in the protein extract.

TABLE VII Yield of Yield of oligosides in oligosides in Protein form ODseries (%) R series (%) Total oligoside yield Native enzyme 36 28 64Δ(PS) 41 14 55 Δ(CD2) 67  1 68 GBD-CD2 45 47 92 CD1-GBD 100   0 100 

As indicated in FIG. 7 below, the presence of oligosides from series Rwas only detected with enzymatic forms having the catalytic domain CD2,with the exception of the case in which said domain was isolated andthen rendered completely inactive. In fact, the retention time for theoligosides synthesized by the deleted form of the second catalyticdomain and by the CD1-GBD form corresponded only to those in the ODseries, i.e. to GOSs deprived of α(1→2) bonds. These results thusindicate that the CD2 domain was required for the formation of α(1→2)bonds.

The products obtained with the GBD-CD2 form have supported theseobservations. This construction, which had CD2 as the only catalyticdomain, was capable of catalyzing in a preponderant manner the synthesisof oligosides from the R series, having α(1→2) bonds. Thus, this resultsdemonstrates that specificity in terms of the function of the DSR-Eenzyme resides in the highly original sequence for this domain, and notin the association of two catalytic domains. Further, the GBD-CD2protein form also allowed the synthesis of α(1→6) bonds. However, thelow yields obtained for these oligosides indicated that they werepreferentially converted into oligosides with a higher degree ofpolymerization belonging to the R series, which prevented theiraccumulation in the reaction medium, differing from molecules from the Rseries which were not converted (2).

By comparing the profiles of the products obtained as shown in FIG. 7,it is clear that the entire form Δ(PS) mainly synthesizes linearoligosides. In fact, the molecule R4 was absent and the oligoside R5only present in a small amount. The catalytic domain CD1 catalyzed theexclusive synthesis of α(1→6) bonds and its activity appeared to bepreponderant with respect to that of the CD2 domain. In addition, in theentire form of the enzyme, the implication of the CD2 domain would thusbe less important because of: (i) lower intrinsic catalytic parameters;and/or (ii) a global enzyme configuration that was unfavorable to itsactivity.

Further, the entire enzyme Δ(PS) catalyzed the synthesis of oligosidesfrom the R series with a lower yield than that observed with the mixtureof dextransucrases produced by L. mesenteroides NRRL B-1299 (FIG. 7).The yield obtained, 28%, was situated between those observed for theentire form Δ(PS) and for the GBD-CD2 form. It is known that the wildstrain produces several forms of dextransucrases that are susceptible ofsynthesizing osidic bonds, in particular α(1Δ2) bonds. One hypothesishas been proposed, in which said forms are the degradation products ofDSR-E. Insofar as the truncated forms of DSR-E such as GBD-CD2 couldcatalyze the synthesis of oligosides from the R series more effectively,it would appear that the yields obtained with the heterogeneous mixtureproduced by L. mesenteroides NRRL B-1299 can be attributed to thecontribution of the catalytic activities of the ensemble of saiddifferent enzymatic forms.

In conclusion, by isolating a particular dextransucrase produced by L.mesenteroides, the inventors have succeeded in characterizing aparticular and unexpected structure of this enzyme that can produceoligosides of interest and have α(1→2) type linkages. Identification andcharacterization of this sequence allows the construction of recombinantcells or organisms specifically expressing this enzyme and also allowsits modification by directed or random mutagenesis or by DNA shufflingto further improve its characteristics to be envisaged.

This invention can also improve the yield and reproducibility of theproduction of GOSs of interest for the different applications citedabove.

REFERENCES

-   (1) Monchois V., Willemot R. M., Monsan P. (1999). Glucansucrases:    mechanism of action and structure-function relationships. FEMS    microbiol. Rev. 23,131-151.-   (2) Dols M., Remaud-Simeon M., Willemot R. M., Vignon M. R.,    Monsan P. F. (1998). Structural characterization of the maltose    acceptor-products synthesised by Leuconostoc mesenteroides NRRL    B-1299 dextransucrase. Carbohydrate Research 305, 549-559.-   (3) Arnold F. H. (2001). Nature 409 n° 6817, 253.-   (4) Monchois V. Vignon M., Russel R. R. B. (1999). Isolation of key    amino-acid residues at the N-terminal end of the core region of    Streptococcus downei glucansucrase GTF-I. Appl. Microbiol.    Biotechnol. 52, 660-665.-   (5) Wilke-Douglas M., Perchorowicz J. T., Houck C. M., 20    Thomas B. R. (1989). Methods and compositions for altering physical    characteristics of fruit and fruit products. PCT patent, WO    89/12386.-   (6) Arguello-Morales M. A., Remaud-Simeon M., Pizzut S., Sarcabal    P., Willemot R. M., Monsan P. (2000). Sequence analysis of the gene    encoding alternansucrase, a sucrose glucosyltransferase from    Leuconostoc mesenteroides NRRL B-1355. FEMS Microb. Lett. 182,    81-85.-   (7) Devulapalle K. S., Goodman S., Gao Q, Hemsley A., Mooser G.    (1997). Knowledge-based model of a glusocyltransferase from oral    bacterial group of mutant streptococci. Protein Sci. 6, 2489-2493.-   (8) Kato C., and Kuramitsu H. K. (1990). Carboxy-terminal deletion    analysis of the Streptococcus mutans glucosyltransferase-1 enzyme.    FEMS Microbiol. Lett. 72, 299-302.-   (9) Lis M., Shiroza T., et Kuramitsu H. K. (1995). Role of the    C-terminal direct repeating units of the Streptococcus mutans    glucosyltransferase-S in glucan binding. Appl. Env. Microbiol. 61,    2040-2042.-   (10) Monchois V., Remaud-Simeon M., Russel R. R. B., Monsan P. and    Willemot R. M. (1997). Characterization of Leuconostoc mesenteroides    NRRL B-512F dextransucrase (DSR-S) and identification of amino-acid    residues playing a key role in enzyme activity. Appl. Microbiol.    Biotechnol. 48, 465-472.

1. An isolated nucleic acid comprising SEQ ID NO:4 or its full lengthcomplementary strand.
 2. An isolated nucleic acid comprising: a)sequence encoding a dextransucrase expressed by the plasmid pCR-Ty-dsrDdeposited at the CNCM with accession number I-2649; or b) a full lengthcomplementary sequence to the sequence in a).
 3. An expression vectorcomprising a nucleic acid according to claim 1 or claim
 2. 4. Theexpression vector according to claim 3, in which the nucleic acid isunder the control of a sequence allowing its expression in prokaryoticor eukaryotic cells.
 5. A host cell transformed by a nucleic acidaccording to claim
 1. 6. A host cell transformed by a vector accordingto claim
 3. 7. The transformed host cell according to claim 5, selectedfrom the group comprising E. coli, Leuconostocci, plants, Lactococci andBacilli or yeasts.
 8. The transformed host cell according to claim 7,wherein said transformed host cell is a strain of E. coli deposited atthe CNCM with accession number I-2649.
 9. An isolated nucleic acidencoding an enzyme with glycosyltransferase activity that can formdextrans having α(1→2) linkages from saccharose, α-D-fluoroglucose,paranitrophenyl-α-D glucopyranoside, α-D-glucopyranoside-α-Dsorbofuranoside or 4-O-α-D galactopyranosylsucrose and comprising atleast one nucleotide sequence encoding a catalytic domain of SEQ ID NO:3and located 3′ of a sequence encoding a glucan binding domain.
 10. Anisolated nucleic acid consisting of SEQ ID NO:4 or its full lengthcomplementary strand.
 11. A host cell transformed by a nucleic acidaccording to claim
 9. 12. A host cell transformed by a nucleic acidaccording to claim
 2. 13. A host cell transformed by an expressionvector according to claim
 4. 14. The isolated nucleic acid moleculeaccording to claim 9, wherein the nucleotide sequence encoding theglucan binding domain is between two nucleotide sequences encoding thecatalytic domains.